3,124 research outputs found
Cross-lingual transfer learning and multitask learning for capturing multiword expressions
This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119
The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches
Advances in automatic terminology processing: methodology and applications in focus
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.The information and knowledge era, in which we are living, creates challenges in many fields, and terminology is not an exception. The challenges include an exponential growth in the number of specialised documents that are available, in which terms are presented, and the number of newly introduced concepts and terms, which are already beyond our (manual) capacity. A promising solution to this ‘information overload’ would be to employ automatic or semi-automatic procedures to enable individuals and/or small groups to efficiently build high quality terminologies from their own resources which closely reflect their individual objectives and viewpoints. Automatic terminology processing (ATP) techniques have already proved to be quite reliable, and can save human time in terminology processing. However, they are not without weaknesses, one of which is that these techniques often consider terms to be independent lexical units satisfying some criteria, when terms are, in fact, integral parts of a coherent system (a terminology). This observation is supported by the discussion of the notion of terms and terminology and the review of existing approaches in ATP presented in this thesis. In order to overcome the aforementioned weakness, we propose a novel methodology in ATP which is able to extract a terminology as a whole. The proposed methodology is based on knowledge patterns automatically extracted from glossaries, which we considered to be valuable, but overlooked resources. These automatically identified knowledge patterns are used to extract terms, their relations and descriptions from corpora. The extracted information can facilitate the construction of a terminology as a coherent system. The study also aims to discuss applications of ATP, and describes an experiment in which ATP is integrated into a new NLP application: multiplechoice test item generation. The successful integration of the system shows that ATP is a viable technology, and should be exploited more by other NLP applications
Mutual terminology extraction using a statistical framework
In this paper, we explore a statistical framework for mutual bilingual terminology extraction. We propose three probabilistic models to assess the proposition that automatic alignment can play an active role in bilingual terminology extraction and translate it into mutual bilingual terminology extraction. The results indicate that such models are valid and can show that mutual bilingual terminology extraction is indeed a viable approach
Cognitive processing of multiword expressions in native and non-native speakers of English: evidence from gaze data
Gaze data has been used to investigate the cognitive processing of certain types of formulaic language such as idioms and binominal phrases, however, very little is known about the online cognitive processing of multiword expressions. In this paper we use gaze features to compare the processing of verb - particle and verb - noun multiword expressions to control phrases of the same part-of-speech pattern. We also compare the gaze data for certain components of these expressions and the control phrases in order to find out whether these components are processed differently from the whole units. We provide results for both native and non-native speakers of English and we analyse the importance of the various gaze features for the purpose of this study. We discuss our findings in light of the E-Z model of reading
Using gaze data to predict multiword expressions
In recent years gaze data has been increasingly used to improve and evaluate NLP
models due to the fact that it carries information about the cognitive processing
of linguistic phenomena. In this paper we
conduct a preliminary study towards the
automatic identification of multiword expressions based on gaze features from native and non-native speakers of English.
We report comparisons between a part-ofspeech (POS) and frequency baseline to:
i) a prediction model based solely on gaze
data and ii) a combined model of gaze
data, POS and frequency. In spite of the
challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze
data (from native versus non-native speakers) affects the prediction, showing that
data from the two groups is discriminative
to an equal degree. Finally, we show that
late processing measures are more predictive than early ones, which is in line with
previous research on idioms and other formulaic structures.Na
Automatic question answering for medical MCQs: Can it go further than information retrieval?
We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require training data. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach
with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers
SYNTHESIS OF COPPER-BASED NANOPARTICLE CATALYSTS BY DIFFERENT METHODS FOR TOTAL OXIDATION OF VOC
In this paper, the process of preparing 10 wt.% Cu/g-Al2O3 catalysts was studied by different methods. The changes in structure and texture of the catalysts were examined by X-ray diffraction (XRD), transmission electron microscopy (TEM) and Fourier-transform infrared spectroscopy (FT-IR). The activities of catalyst were investigated completely oxidized VOC (toluene and n-butanol) on gas-phase reactions over the Cu/g-Al2O3 catalyst. The results were found that influence of the size of copper nanoparticles enhancing copper dispersion and selectivity of the catalyst prepared by non-thermal plasma (NTP) was superior to those obtained from the impregnation (WI) and deposition-precipitation (DP). The total oxidation of VOC to CO2 and H2O was achieved above 275oC. Compared to the WI and DP, the NTP method increased the oxidation efficiency by 15-30%
Corpora for Computational Linguistics
Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction.
Their influence on other fields is also briefly discussed
Double RIS-Assisted MIMO Systems Over Spatially Correlated Rician Fading Channels and Finite Scatterers
This paper investigates double RIS-assisted MIMO communication systems over
Rician fading channels with finite scatterers, spatial correlation, and the
existence of a double-scattering link between the transceiver. First, the
statistical information is driven in closed form for the aggregated channels,
unveiling various influences of the system and environment on the average
channel power gains. Next, we study two active and passive beamforming designs
corresponding to two objectives. The first problem maximizes channel capacity
by jointly optimizing the active precoding and combining matrices at the
transceivers and passive beamforming at the double RISs subject to the
transmitting power constraint. In order to tackle the inherently non-convex
issue, we propose an efficient alternating optimization algorithm (AO) based on
the alternating direction method of multipliers (ADMM). The second problem
enhances communication reliability by jointly training the encoder and decoder
at the transceivers and the phase shifters at the RISs. Each neural network
representing a system entity in an end-to-end learning framework is proposed to
minimize the symbol error rate of the detected symbols by controlling the
transceiver and the RISs phase shifts. Numerical results verify our analysis
and demonstrate the superior improvements of phase shift designs to boost
system performance.Comment: 15 pages, 9 figures, accepted by IEEE Transactions on Communication
- …